A four layer sharing HMM system for very large vocabulary isolated word recognition
نویسندگان
چکیده
This paper reports on a large vocabulary speaker independent isolated word recognizer targeting 50,000 words. The system supports a unique four-layer sharing structure for either continuous HMM or discrete HMM. Evaluation is performed using a dictionary of 5000 US city names, a dictionary of the 5000 English most frequent words, a dictionary of 50,000 English words, and the 110,000 word CMU English dictionary. For these dictionaries, recognition accuracy ranges from 90% to 93% for the top 3 results. The speech signal is a one-dimensional waveform as shown in FIGURE 1. The speech signal may be labeled with a sequence of phonemes. A word may correspond to one or more continuous phonemes. An example of the pho-neme labels of the isolated word " item " is also shown in FIGURE 1. FIGURE 1 : Speech waveform and phonemes of the isolated word " item ". A left-to-right HMM process is used to model the speech waveform in this Speech Recognition Engine (SRE) as shown in FIGURE 2. This figure displays a simple 3-state left-to-right HMM of a phoneme where the context includes the left and the right phone, i.e., the HMM is context dependent. This type of HMM is chosen because it offers convenient flexibility for state sharing between the first, second, and last state of the HMM, as explained below. A series of HMMs correspond to a series of pho-nemes. The observations are emitted from each state of the HMM process. The observation probabilities can be formulated as probability distribution , where refers to a state in a HMM. Each state transition, shown as an arc in FIGURE 2, is associated with a state transition probability which denotes the probability of transitioning using the arc of state. FIGURE 2 : A three state left-to-right HMM used in SRE for all the phonemes in a particular context. Suppose there are types of observations, and denotes the distribution of state and type (or stream) , then (1) The observations can be handled in two ways to create either a discrete observation HMM (DHMM) or a continuous observation HMM (CHMM). The probability distribution of the discrete observation HMM defined as (2) is a one-dimension array with each scalar denoting the probability of observing the vector quantized symbol for state , and denoting the sub-probability distribution that is a component in. in the equation denotes the total number of. The sub-probability distribution is introduced …
منابع مشابه
MAN-MACHINE INTERACTION SYSTEM FOR SUBJECT INDEPENDENT SIGN LANGUAGE RECOGNITION USING FUZZY HIDDEN MARKOV MODEL
Sign language recognition has spawned more and more interest in human–computer interaction society. The major challenge that SLR recognition faces now is developing methods that will scale well with increasing vocabulary size with a limited set of training data for the signer independent application. The automatic SLR based on hidden Markov models (HMMs) is very sensitive to gesture's shape inf...
متن کاملRWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts
We present a novel large vocabulary OCR system, which implements a 5 confidenceand margin-based discriminative training approach for model adap6 tation of an HMM based recognition system to handle multiple fonts, different 7 handwriting styles, and their variations. Most current HMM approaches are HTK 8 based systems which are maximum-likelihood (ML) trained and which try to adapt 9 their model...
متن کاملDevelopment of HMM/Neural Network-Based Medium-Vocabulary Isolated-Word Lithuanian Speech Recognition System
The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, a fully connected three-layer neural network (a multi-layer perceptron) is trained by conventional stochastic backpropagation algorithm to esti...
متن کاملNonreciprocal data sharing in estimating HMM parameters
Parameter tying is often used in large vocabulary continuous speech recognition (LVCSR) systems to balance the model resolution and generalizability. However, one consequence of tying is that the differences among tied constructs are ignored. Parameter tying can be alternatively viewed as reciprocal data sharing in that a tied construct uses data associated with all others in its tiedclass. To ...
متن کاملConfidence measures for hybrid HMM/ANN speech recognition
In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word levels, using the North American Business News corpus.
متن کامل